Learning advanced SQL the weird (and hard) way

Devoxx France
2024-04-17

Image by Pexels from Pixabay
logo EDB

Who am I

  • Lætitia Avrot
  • PostgreSQL recognized contributor
  • PostgreSQL Europe board member
  • #PostgresWomen co-founder
  • EDB Practice Leader
  • University teacher in Lyon
  • mydbanotebook.org / psql-tips.org
Image by Anemone123 from Pixabay

Learning advanced SQL the weird (and hard) way

  • Some rules
  • Some technics
  • What I learned
logo EDB

Advent of code

  • https://adventofcode.com/
  • Yearly code challenge during holliday season
  • 2 challenges revealed each day
  • Become harder and harder (and harder)
  • Can be solved even years after
logo EDB

Chose your poison language

  • No mandatory language
  • As long as it's a Turing complete language
Image by Pexels from Pixabay
logo EDB

SQL!!!

Image by 8photo on Freepik
logo EDB

Some personal rules

  • SQL only (Postgres flavoured)
    • SQL functions and procedures allowed
    • As many tables as I want
    • Datatypes, operators...
  • No extension
  • Latest version of Postgres
    • So Postgres 16 in Dec. 2022, Postgres 17 in Dec. 2023
logo EDB

Things I learned

  • I know nothing
  • I need the documentation constantly
  • I overcomplicate things
  • SQL is not meant to do that
Image by svklimkin from Pixabay

Structuring my repo

  • Directories!
    • Language
    • Year
    • Day
  • input files
    • example
    • full input
    • input file
Image by Pexels from Pixabay

Structuring my repo

laetitia:~/tech/laetitia/adventofcode/SQL/2023/01|master⚡ ⇒  ll
total 120
drwxr-xr-x  7 laetitia  staff    224 Dec 14 00:14 .
drwxr-xr-x  9 laetitia  staff    288 Dec  7 07:55 ..
-rw-r--r--  1 laetitia  staff     41 Dec  1 06:18 example.csv
-rw-r--r--  1 laetitia  staff  21760 Dec  1 09:36 full.csv
-rw-r--r--  1 laetitia  staff  21760 Dec  1 12:35 input.csv
-rw-r--r--  1 laetitia  staff   2928 Dec  1 14:23 solution.sql
Image by Pexels from Pixabay

Loading data

  • Try to keep the raw input
  • Add an id
    id integer
  • Use copy
    /* Use psql, the best Postgres client */
    \copy input(data) from 'input.csv';
  • You don't really need the file to be called 'csv'
Image by Alfred Derks from Pixabay

Loading data

laetitia:~/tech/laetitia/adventofcode/SQL/2023/01 ⇒ cat example.csv 
1abc2
pqr3stu8vwx
a1b2c3d4e5f
treb7uchet
Image by Alfred Derks from Pixabay

Making the script idempotent

  • Vacuum and analyzing
  • → Can't rollback a single transaction

  • Creating and droping objects is difficult to maintain
  • → Create everything in a schema!

    → Drop the schema (Cascading)!

Image by Andrew Martin from Pixabay

Data types

  • numbers
  • text
  • bit strings
  • ranges
  • geometric datatypes (points, lines, polygons)
Image by Pexels from Pixabay

Performance

  • Vacuum analyze
  • Index if needed
create index on production (minute, bluePrintId);
create index on map ((p[1]),(p[0]));
create index on map using gist (p) where value='#';
Image by Pexels from Pixabay

Most useful tools and tips

Image by Colin Behrens from Pixabay

psql

  • Best SQL client for PostgreSQL
  • Only official client for PostgreSQL
  • Efficient and lightweight
  • Great help on SQL commands
  • Auto-completion
Image by Lætitia Avrot

psql

laetitia=# \h create function
Command:     CREATE FUNCTION
Description: define a new function
Syntax:
CREATE [ OR REPLACE ] FUNCTION
    name ( [ [ argmode ] [ argname ] argtype [ { DEFAULT | = } default_expr ] [, ...] ] )
    [ RETURNS rettype
      | RETURNS TABLE ( column_name column_type [, ...] ) ]
  { LANGUAGE lang_name
    | TRANSFORM { FOR TYPE type_name } [, ... ]
    | WINDOW
    | { IMMUTABLE | STABLE | VOLATILE }
    | [ NOT ] LEAKPROOF
    | { CALLED ON NULL INPUT | RETURNS NULL ON NULL INPUT | STRICT }
    | { [ EXTERNAL ] SECURITY INVOKER | [ EXTERNAL ] SECURITY DEFINER }
    | PARALLEL { UNSAFE | RESTRICTED | SAFE }
    | COST execution_cost
    | ROWS result_rows
    | SUPPORT support_function
    | SET configuration_parameter { TO value | = value | FROM CURRENT }
    | AS 'definition'
    | AS 'obj_file', 'link_symbol'
    | sql_body
  } ...

URL: https://www.postgresql.org/docs/devel/sql-createfunction.html
Image by Lætitia Avrot

psql

laetitia=# \df regexp_substr
                                         List of functions
   Schema   |     Name      | Result data type |             Argument data types             | Type 
------------+---------------+------------------+---------------------------------------------+------
 pg_catalog | regexp_substr | text             | text, text                                  | func
 pg_catalog | regexp_substr | text             | text, text, integer                         | func
 pg_catalog | regexp_substr | text             | text, text, integer, integer                | func
 pg_catalog | regexp_substr | text             | text, text, integer, integer, text          | func
 pg_catalog | regexp_substr | text             | text, text, integer, integer, text, integer | func
(5 rows)
Image by Lætitia Avrot

generate_series

  • Postgres function
  • Generates numbers (or timestamps)
  • between 2 values
  • with a given step

generate_series

select x.*, y.*
from generate_series(minx, maxx) as x(n),
  generate_series(miny, maxy) as y(n)

generate_series

  • Use with ordinality if needed
select int8range(numbers[odd.n]::bigint,
  numbers[odd.n]::bigint + numbers[even.n]::bigint)
from (select * from input where id=1),
  /* cross join but as we have only one row with id=1, we should be ok */
  generate_series(1,20,2) with ordinality as odd(n,id)
  inner join generate_series(2,20,2) with ordinality as even(n,id)
    on odd.id = even.id
  where numbers[odd.n] is not null
    and numbers[even.n] is not null

generated columns

  • Generating ids
create table input (
  id integer generated always as identity);
Image by Lutz Peter from Pixabay

generated columns

  • Overrinding a generated identity
create table input (
  id integer generated always as identity,
  value text);

insert into input(id,value)
  overriding system value
  (select generate_series(0,-1*(rows*9),-1),
    ']'
  from (select count(*) from input where cat = 'crates') as t(rows))
;
Image by Lutz Peter from Pixabay

generated columns

create table input (
  id integer generated always as identity primary key,
  data text not null,
  hand text generated always as
    ((regexp_split_to_array(data, ' '))[1]) stored,
  bet int generated always as
    ((regexp_split_to_array(data, ' '))[2]::int) stored,
  A int generated always as (regexp_count(data,'A')) stored,
  K int generated always as (regexp_count(data,'K')) stored,
  Q int generated always as (regexp_count(data,'Q')) stored,
  J int generated always as (regexp_count(data,'J')) stored,
  nine int generated always as (regexp_count(data,'9')) stored,
  eight int generated always as (regexp_count(data,'8')) stored,
  seven int generated always as (regexp_count(data,'7')) stored,
  six int generated always as (regexp_count(data,'6')) stored,
  five int generated always as (regexp_count(data,'5')) stored,
  four int generated always as (regexp_count(data,'4')) stored,
  three int generated always as (regexp_count(data,'3')) stored,
  two int generated always as (regexp_count(data,'2')) stored
)
Image by Lutz Peter from Pixabay

Regular expressions

  • Very usefull for parsing
  • But still a curse to use
  • ~ and ~* operators
  • regexp_like
  • A lot of functions
Image by Freepik

Regular expressions

  • Regexp functions
    • regexp_count
    • regexp_instr
    • regexp_match, regexp_matches
    • regexp_replace
    • regexp_split_to_array, regexp_split_to_table
    • regexp_substr
Image by Freepik

Regular expressions

regexp_matches(value, '[a-zA-Z]{4}','g')
regexp_count(data,'K')
Image by Freepik

The filter clause

  • Apply a where clause to an aggregate function
  • usefull to pivot a table
  • filter used with count is particular

The filter clause

count(*) filter (where <condition>)
count(case when <condition> then 1 end)
select
  count(*) as unfiltered,
  count(*) filter (where i < 5) AS filtered
FROM generate_series(1,10) AS s(i);
 unfiltered | filtered
 ------------+----------
          10 |        4
 (1 row)

The filter clause

select sum(value) as first_star
from (
    select value,
           count(*) filter (where value is null)
             over (order by id) as grp
    from input
)
group by grp
order by first_star desc
fetch first row only
;

1 subquery and 8 lines of code instead of 4 CTEs and 63 lines of code

CTEs

  • Better readability
  • Helps iterate naturally
  • Chain them!

CTEs

  • Choose your names wisely
  • Name the columns
  • Don't forget the as keyword

CTEs

/* get the lastId of the range + the range number (also known as eflNumber) */
with lastId(id, elfNumber) as (...),
/* get the firstId of the range + the range number (also known as eflNumber) */
firstId(id, elfNumber) as (...),
/* get the elfNumber for each value */
elfNumbers(id, elfNumber) as (...),
/* get the total calories for each elf */
totalCalories(value) as (...)

select max(value) as PartOne
from totalCalories;

Recursive CTEs

with recursive t(n) as (
    values (1)
  union all
    select n+1
    from t
    where n < 100
)
select sum(n) from t;

Another way to recurse

  1. Create a table
  2. Create a stored procedure to populate the table
  3. Create a query to generate calls to the stored procedure
  4. Use \gexec

Another way to recurse

  1. Create a table
create table (n interger);

Another way to recurse

  1. Create a stored procedure to populate the table
create procedure demonstration(myn integer) as
$demonstration$
  update demo set n= myn + 1;
$demonstration$ language sql; 

Another way to recurse

  1. Create a query to generate calls to the stored procedure
  2. Use \gexec
select $$call demonstration($$ || n || $$);$$
from generate_series(1,100) t(n)
\gexec

Removing intermediate output

  • Use \set QUIET on
\set QUIET on
select $$call demonstration($$ || n || $$);$$
from generate_series(1,100) t(n)
\gexec
\set QUIET off

Putting everything together

create table demo (n integer);

create procedure increment(myn integer) as
$increment$
  update demo set n= myn + 1
$increment$ language sql;

\set QUIET on
select $$call demonstration($$ || n || $$);$$
from generate_series(1,100) t(n)
\gexec
\set QUIET off

select sum(n)
from demo;

nullif

  • Standard SQL function
  • If arg1 = arg2, then returns null
  • Else returns arg1
  • Usefull for creating null values out of a parsed file
Image by rarestohanean from Pixabay

nullif

update input
  set
    elf = nullif(substring(value from 2 for 1),' ');
Image by rarestohanean from Pixabay

Comments

  • Don't be shy
  • Helps clarifying your thoughts
  • Helps reading your code
  • Don't remove your former code
Image by Werner Moser from Pixabay

Comments

/* Let's create a map.
 * The cave is 7 units wide. The floor is the 7 points with height 0).
 * We'll also add 5 free lines.
 * Thanks to my dad pointing that out, as the rocks first moves
 * right/left and then down, we have to code the last movement
 * differently. This is not very smart. My dad suggested that we
 * instead make the rock appear on the 5th line after the highest
 * rock which should make all moves behave the same way (and is
 * possible because the rock will always be able to go down on the
 * first move.
 */
Image by Werner Moser from Pixabay

How to learn?

Image by 12019 from Pixabay

The end

  • Don't put pressure on yourself
  • The main goal is having fun (and sometimes learning)
  • Make your code work, not necessarily the smart way
  • If you're curious
  • Thank you!
Image by Jan Vašek from Pixabay

Raffle!

qr code for raffle

Questions?

Image by Tumisu from Pixabay